Unlabeled Data Does Provably Help

نویسندگان

  • Malte Darnstädt
  • Hans Ulrich Simon
  • Balázs Szörényi
چکیده

A fully supervised learner needs access to correctly labeled examples whereas a semi-supervised learner has access to examples part of which are labeled and part of which are not. The hope is that a large collection of unlabeled examples significantly reduces the need for labeled-ones. It is widely believed that this reduction of “label complexity” is marginal unless the hidden target concept and the domain distribution satisfy some “compatibility assumptions”. There are some recent papers in support of this belief. In this paper, we revitalize the discussion by presenting a result that goes in the other direction. To this end, we consider the PAC-learning model in two settings: the (classical) fully supervised setting and the semi-supervised setting. We show that the “label-complexity gap” between the semi-supervised and the fully supervised setting can become arbitrarily large for concept classes of infinite VC-dimension (or sequences of classes whose VC-dimensions are finite but become arbitrarily large). On the other hand, this gap is bounded by O(ln |C|) for each finite concept class C that contains the constant zeroand the constant one-function. A similar statement holds for all classes C of finite VC-dimension. 1998 ACM Subject Classification I.2.6 Concept Learning

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning

We study the potential benefits of unlabeled data to classification prediction to the learner. We compare learning in the semi-supervised model to the standard, supervised PAC (distribution free) model, considering both the realizable and the unrealizable (agnostic) settings. Roughly speaking, our conclusion is that access to unlabeled samples cannot provide sample size guarantees that are bett...

متن کامل

Artemia: a family of provably secure authenticated encryption schemes

Authenticated encryption schemes establish both privacy and authenticity. This paper specifies a family of the dedicated authenticated encryption schemes, Artemia. It is an online nonce-based authenticated encryption scheme which supports the associated data. Artemia uses the permutation based mode, JHAE, that is provably secure in the ideal permutation model. The scheme does not require the in...

متن کامل

Learning Safe Prediction for Semi-Supervised Regression

Semi-supervised learning (SSL) concerns how to improve performance via the usage of unlabeled data. Recent studies indicate that the usage of unlabeled data might even deteriorate performance. Although some proposals have been developed to alleviate such a fundamental challenge for semisupervised classification, the efforts on semi-supervised regression (SSR) remain to be limited. In this work ...

متن کامل

Statistical Analysis of Semi-Supervised Regression

Semi-supervised methods use unlabeled data in addition to labeled data to construct predictors. While existing semi-supervised methods have shown some promising empirical performance, their development has been based largely based on heuristics. In this paper we study semi-supervised learning from the viewpoint of minimax theory. Our first result shows that some common methods based on regulari...

متن کامل

Active Learning : Bandits

In supervised learning, the goal of the learning algorithm is to learn a predictor, h : X → Y , which accurately predicts labels of future instances. In the traditional PAC learning model, the learner receives a training set of examples which are sampled i.i.d. from an unknown distribution D over X × Y . The learner has no control on which examples he receives. In active learning the learner in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013